Microswimmers can acquire information on the surrounding fluid by sensing mechanical queues. They can then navigate in response to these signals. We analyse this navigation by combining deep reinforcement learning with direct numerical simulations to resolve the hydrodynamics. We study how local and non-local information can be used to train a swimmer to achieve particular swimming tasks in a non-uniform flow field, in particular a zig-zag shear flow. The swimming tasks are (1) learning how to swim in the vorticity direction, (2) the shear-gradient direction, and (3) the shear flow direction. We find that access to lab frame information on the swimmer's instantaneous orientation is all that is required in order to reach the optimal policy for (1,2). However, information on both the translational and rotational velocities seem to be required to achieve (3). Inspired by biological microorganisms we also consider the case where the swimmers sense local information, i.e. surface hydrodynamic forces, together with a signal direction. This might correspond to gravity or, for micro-organisms with light sensors, a light source. In this case, we show that the swimmer can reach a comparable level of performance as a swimmer with access to lab frame variables. We also analyse the role of different swimming modes, i.e. pusher, puller, and neutral swimmers.
translated by 谷歌翻译
We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object instance annotations across 216 3D spaces and 3,100 rooms within those spaces. The scale, quality, and diversity of object annotations far exceed those of prior datasets. A key difference setting apart HM3DSEM from other datasets is the use of texture information to annotate pixel-accurate object boundaries. We demonstrate the effectiveness of HM3DSEM dataset for the Object Goal Navigation task using different methods. Policies trained using HM3DSEM perform outperform those trained on prior datasets. Introduction of HM3DSEM in the Habitat ObjectNav Challenge lead to an increase in participation from 400 submissions in 2021 to 1022 submissions in 2022.
translated by 谷歌翻译
近年来,以用户为中心的应用程序有所增长,这些应用程序需要在低数据制度中跨任务进行有效的知识转移。一个示例是个性化,通过学习少量属于特定用户的标记数据,可以调整一个预处理的系统。这种设置需要在低计算复杂性下高精度,因此准确性的帕累托前沿与适应性成本起着至关重要的作用。在本文中,我们将在几个摄影图像分类设置中推动此帕累托前沿,并具有两个关键的贡献:(i)一个称为上下文挤压和兴奋(案例)的新型自适应块,该块在新任务上调整了预处理的神经网络,以显着通过用户数据(上下文)的单个正向通过,以及(ii)基于称为大写的坐标培训协议(II)的混合训练协议,以提高性能,该协议利用了元训练的情况块和微调例程,以进行有效的适应。大写在VTAB+MD的26个数据集和充满挑战的现实世界个性化基准(Orbit)上,相对于元学习者的新最先进的准确性(轨道),从而通过领先的微调方法缩小了差距自适应成本较低的数量级。
translated by 谷歌翻译
现代的深度学习系统越来越多地部署在个性化和联合学习等情况下,需要支持i)学习少量数据,ii)沟通有效的分布式培训协议。在这项工作中,我们开发了胶片转移(FIT),该胶片在图像分类设置中满足了这些要求。 FIT使用自动配置的幼稚贝叶斯分类器在固定的主链上,该主链在大型图像数据集上仔细考虑。参数有效膜层用于调节主链,从而为下游任务塑造表示形式。该网络通过情节微调协议进行培训。该方法是参数效率的,这对于能够实现几次学习,廉价的个性化模型更新以及沟通有效的联合学习的关键。我们尝试适合各种下游数据集,并表明它可以比最先进的大型转移(位)算法在低射击和挑战性的VTAB-1K基准上获得更好的分类准确性,该算法的精度少于1%可更新参数。最后,我们证明了在分布式低弹药应用中拟合的参数效率,包括模型个性化和联合学习,其中模型更新大小是重要的性能指标。
translated by 谷歌翻译
我们介绍了栖息地2.0(H2.0),这是一个模拟平台,用于培训交互式3D环境和复杂物理的场景中的虚拟机器人。我们为体现的AI堆栈 - 数据,仿真和基准任务做出了全面的贡献。具体来说,我们提出:(i)复制:一个由艺术家的,带注释的,可重新配置的3D公寓(匹配真实空间)与铰接对象(例如可以打开/关闭的橱柜和抽屉); (ii)H2.0:一个高性能物理学的3D模拟器,其速度超过8-GPU节点上的每秒25,000个模拟步骤(实时850x实时),代表先前工作的100倍加速;和(iii)家庭助理基准(HAB):一套辅助机器人(整理房屋,准备杂货,设置餐桌)的一套常见任务,以测试一系列移动操作功能。这些大规模的工程贡献使我们能够系统地比较长期结构化任务中的大规模加固学习(RL)和经典的感官平面操作(SPA)管道,并重点是对新对象,容器和布局的概括。 。我们发现(1)与层次结构相比,(1)平面RL政策在HAB上挣扎; (2)具有独立技能的层次结构遭受“交接问题”的困扰,(3)水疗管道比RL政策更脆。
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.
translated by 谷歌翻译
We demonstrate how efficient autonomous drone swarms can be in detecting and tracking occluded targets in densely forested areas, such as lost people during search and rescue missions. Exploration and optimization of local viewing conditions, such as occlusion density and target view obliqueness, provide much faster and much more reliable results than previous, blind sampling strategies that are based on pre-defined waypoints. An adapted real-time particle swarm optimization and a new objective function are presented that are able to deal with dynamic and highly random through-foliage conditions. Synthetic aperture sensing is our fundamental sampling principle, and drone swarms are employed to approximate the optical signals of extremely wide and adaptable airborne lenses.
translated by 谷歌翻译